Epigenetic scores derived in saliva are associated with gestational age at birth

Background Epigenetic scores (EpiScores), reflecting DNA methylation (DNAm)-based surrogates for complex traits, have been developed for multiple circulating proteins. EpiScores for pro-inflammatory proteins, such as C-reactive protein (DNAm CRP), are associated with brain health and cognition in adults and with inflammatory comorbidities of preterm birth in neonates. Social disadvantage can become embedded in child development through inflammation, and deprivation is overrepresented in preterm infants. We tested the hypotheses that preterm birth and socioeconomic status (SES) are associated with alterations in a set of EpiScores enriched for inflammation-associated proteins. Results In total, 104 protein EpiScores were derived from saliva samples of 332 neonates born at gestational age (GA) 22.14 to 42.14 weeks. Saliva sampling was between 36.57 and 47.14 weeks. Forty-three (41%) EpiScores were associated with low GA at birth (standardised estimates |0.14 to 0.88|, Bonferroni-adjusted p-value < 8.3 × 10−3). These included EpiScores for chemokines, growth factors, proteins involved in neurogenesis and vascular development, cell membrane proteins and receptors, and other immune proteins. Three EpiScores were associated with SES, or the interaction between birth GA and SES: afamin, intercellular adhesion molecule 5, and hepatocyte growth factor-like protein (standardised estimates |0.06 to 0.13|, Bonferroni-adjusted p-value < 8.3 × 10−3). In a preterm subgroup (n = 217, median [range] GA 29.29 weeks [22.14 to 33.0 weeks]), SES–EpiScore associations did not remain statistically significant after adjustment for sepsis, bronchopulmonary dysplasia, necrotising enterocolitis, and histological chorioamnionitis. Conclusions Low birth GA is substantially associated with a set of EpiScores. The set was enriched for inflammatory proteins, providing new insights into immune dysregulation in preterm infants. SES had fewer associations with EpiScores; these tended to have small effect sizes and were not statistically significant after adjusting for inflammatory comorbidities. This suggests that inflammation is unlikely to be the primary axis through which SES becomes embedded in the development of preterm infants in the neonatal period. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13148-024-01701-2.


Background
Preterm birth (delivery <37 weeks' gestation) affects around 10% of births worldwide and is closely associated with increased likelihood of cerebral palsy, neurocognitive impairment, behavioural, social and communication difficulties, and mental and cardiometabolic health diagnoses across the life course [1][2][3][4][5].These adverse outcomes can be explained, in part, by deleterious effects of early exposure to extrauterine life on brain and cardiac development, and they are often accompanied by changes in blood proteins, including those reflecting the perinatal innate and adaptive immune response [6][7][8].
Socioeconomic status (SES) is also associated with the adverse neurodevelopmental and health outcomes listed above [9][10][11][12], and social deprivation is consistently overrepresented among preterm children and their families [13,14].In a meta-analysis of 43 studies (n = 111,156 individuals), low SES associated with increased inflammatory markers of disease risk (C-reactive protein [CRP] and interleukin-6 [IL6]), which suggests that pro-inflammatory pathways may be important mechanisms for translating social inequalities into health disparities [15].However, only four studies included participants under 10 years of age, leaving uncertainty about SES-inflammation correlations in early life [16][17][18][19].
Although protein levels are commonly used as biomarkers of exposure and disease risk, their use is limited because they are often phasic in the systemic circulation, rely on venepuncture, and may not capture baseline status or chronicity.For example, inflammation is often measured using acute-phase inflammatory proteins such as CRP [20,21], but it is not always reliable [22], particularly in neonates, and a single-time-point measure may not reflect baseline inflammation or capture chronic inflammation [23].These challenges have been addressed by the development of DNA methylation (DNAm) markers of protein expression (EpiScores), which are derived from a linear weighted sum of DNAm sites that are correlated with protein levels.Several EpiScores are associated with magnetic resonance imaging (MRI) measures of brain health, cognition, child mental health, stroke, ischaemic heart disease, Alzheimer's disease, and lung cancer [24][25][26][27][28][29][30][31].In neonates, DNAm CRP is associated with birth gestational age (GA), perinatal inflammatory processes, and MRI features of encephalopathy of prematurity [32].Childhood SES is associated with differential DNAm in inflammation-related genes [33,34] and at CpG sites that correlate with an inflammation index [35].Adult SES and social mobility are associated with variations in DNAm in inflammation-related genes [33,36].Importantly, SES-related DNAm variations are associated with differences in gene expression so may have functional consequences [33,36].
Several maternal factors are associated with DNAm in term infants sampled soon after birth, including maternal smoking [37], diabetes [38,39], obesity [40,41], and mode of delivery [42,43].It is unknown whether these associations apply in preterm infants, who have a curtailed in utero exposure, and are sampled after prolonged exposure to neonatal intensive care, which is known to have widespread effects on the methylome [44].
We investigated relationships between preterm birth, SES, and 104 EpiScores enriched for inflammationrelated proteins [26-28, 31, 45].We tested the following hypotheses: First, low GA is associated with differences in EpiScores; and second, SES is correlated with EpiScores, and interacts with birth GA, but the relationship is attenuated by inflammatory disease burden in preterm infants.

Participants
Participants were preterm infants (born ≤ 33-weeks' gestation) and term-born infants born at the Royal Infirmary of Edinburgh, UK.These infants were recruited to a longitudinal cohort study designed to investigate the effect of preterm birth on brain development and outcomes with multimodal data collection [46].Infants were recruited between February 2012 and December 2021.
Exclusion criteria were congenital malformation, chromosomal abnormality, congenital infection, cystic periventricular leukomalacia, haemorrhagic parenchymal infarction, and post-haemorrhagic ventricular dilatation.These criteria mean the cohort is representative of the majority of survivors of modern intensive care practices [46].
Final participants included were 217 preterm infants (born ≤ 33-weeks' gestation) and 115 term-born infants, with median birth GA of 29.29 and 39.71 weeks, respectively.Their demographic characteristics are shown in Table 1.The three SES measures (Scottish Index of Multiple Deprivation (SIMD 2016) [47], maternal education, and maternal occupation) differed between the preterm and term groups (Cohen's d effect sizes 0.52-0.68).Ethnicity did not differ between groups and is representative of the Edinburgh area [48].

DNA methylation
Saliva samples for DNAm were collected at term equivalent age using Oragene OG-575 Assisted Collection kits (DNA Genotek, ON, Canada), and DNA was extracted using prepIT.L2P reagent (DNA Genotek, ON, Canada).Saliva sampling was used due to accessibility and the non-invasiveness of the method; DNAm patterns measured via saliva samples correlate with brain and other tissue DNAm patterns [49,50].We chose to sample at the term equivalent gestation time point to include the allostatic load of both prenatal and early postnatal exposures.
DNA was bisulphite converted and methylation levels were measured using Illumina HumanMethylatio-nEPIC BeadChip (Illumina, San Diego, CA, USA) at the Edinburgh Clinical Research Facility (Edinburgh, UK).The arrays were imaged on the Illumina iScan or HiScan platform, and genotypes were called automatically using GenomeStudio Analysis software version 2011.1 (Illumina).DNAm was processed in four batches.
Raw intensity (.idat) files were read into the R environment using minfi.wateRmelon and minfi were used for preprocessing, quality control, and normalisation [51].The pfilter function in wateRmelon was used to exclude samples with 1% of sites with a detection p-value > 0.05, sites with beadcount < 3 in 5% of samples, and sites with 1% of samples with detection p-value > 0.05.Cross-hybridising probes, probes targeting single-nucleotide polymorphisms with overall minor allele frequency ≥ 0.05, and control probes were also removed.Samples were removed if there was a mismatch between predicted sex (minfi) and recorded sex (n = 3), or if samples did not meet preprocessing quality control criteria (n = 29).Data were danet normalised, which includes background correction and dye bias correction [51].Saliva contains different cells types, including buccal epithelial cells.Epithelial cell proportions were estimated with epigenetic dissection of intra-sample heterogeneity with the reduced partial correlation method implemented in the R package EpiDISH [52].Probes located on sex chromosomes were removed before analysis.The cohort includes twins (n = 32); these were randomly removed leaving one participant per twin pair.This left a final sample size of n = 332.

EpiScore calculation
The 104 protein EpiScores included 100 EpiScores from Gadd et al. [26], a study enriched for inflammatoryrelated proteins, excluding those where the required CpGs were not available, owing to differences in assay
For each individual, EpiScores were obtained by multiplying the methylation proportion at a given CpG by the effect size from previous studies.This was performed using the MethylDetectR platform [57] for those inflammatory proteins currently included and using R for those not currently included (CRP, GDF15, IL6, NTproBNP).All CpG sites and coefficients required to calculate the 104 EpiScores are in Supplementary Table 1 (Additional File 2).

Statistics
The predictor variables were SES and birth GA.SES was operationalised in three ways: neighbourhood-level SES using the Scottish Index of Multiple Deprivation (SIMD) [47], and two measures of family-level SES, which were maternal education (highest educational qualification) and maternal occupation (current or most recent occupation).For further details, see Supplementary eMethods, Additional File 1. Birth GA was a continuous variable to maximise statistical power [58,59].We adjusted for GA at saliva sampling, DNAm batch, infant sex, and birthweight z-score.
All statistical analyses were performed in R (version 4.3.1)and were preregistered [60].
Principal component analysis (PCA) was used to determine the significance threshold for controlling type 1 error in analyses of multiple EpiScores [61].We began with a correlation analysis, which showed correlation coefficients between EpiScores of |0.01 to 0.93| (Fig. 1A).To determine the number of statistical "families" among the 104 EpiScores, PCA was performed.This yielded two principal components with eigenvalues > 1, our prespecified threshold, which explained 59.5 and 17.2% of variance, respectively (Fig. 1B).Standardised component loadings are provided in Supplementary Table 2 (Additional File 1).In all subsequent analyses, we corrected for multiple comparisons across EpiScores and SES measures using a Bonferroni-adjusted p-value threshold of 8.3 × 10 −3 .This is 0.05/(2 × 3), with two reflecting the two principal components for EpiScores and three reflecting the number of SES measures used.
We constructed general linear regression models for each EpiScore as outcome measure to assess associations between GA, each of the three SES measures (separate models for each of SIMD, maternal education, and maternal occupation), and the product interaction term SES*birth GA (removing the term if not significant), and adjusting for GA at sampling, sex, and batch.
For the preterm subgroup, we additionally adjusted for perinatal inflammatory exposures known to be associated with the CRP EpiScore as, to our knowledge, this is the only DNAm proxy of an inflammatory protein that has been studied in this context [32].These were histological chorioamnionitis (HCA), sepsis, bronchopulmonary dysplasia (BPD), and necrotising enterocolitis (NEC).Maternal smoking and preeclampsia were not associated with DNAm CRP, so are not included as covariates [32,44].For definitions see Supplementary eMethods (Additional File 1) and for frequencies see Supplementary Table 3 (Additional File 1).
For EpiScores with significant associations with GA or SES, we performed a post hoc sensitivity analysis, adjusting for maternal factors that have been associated with neonatal methylome in term infants in prior research: maternal smoking, diabetes, obesity, and mode of delivery.A change of standardised β by ≥ 20% or change of p-value to ≥ 0.05 was considered significant, and we report adjusted R 2 values of each model.For definitions, see Supplementary eMethods (Additional File 1), and for frequencies, see Table 1.

Associations between gestational age and EpiScores
Gestational age associated with 43 of the 104 EpiScores after adjustment for SIMD, maternal education, or maternal occupation (Fig. 2A-C).The proteins represented by the 43 EpiScores are listed in Table 2 categorised by functional annotation adapted from the STRING database [62], and their broader roles in immune processes and inflammation, and the pathogenesis of neonatal diseases, where known, are described in Supplementary Table 4 (Additional File 1).
Thirty-three EpiScores associated with low GA irrespective of SES measure used in the model.The results for all 104 EpiScores are provided in Supplementary Figs.1-9 (Additional File 1).

Associations between SES, EpiScores, and the effect of inflammatory comorbidities of preterm birth.
Three out of 104 EpiScores associated with SES measures or the interaction between SES and GA (Fig. 3).There was a small effect size association between higher afamin EpiScore and higher maternal occupation (standardised β = 0.06, 95% confidence interval (CI) 0.02-0.11,p = 0.0082), and DNAm afamin associated with the birth GA*maternal education interaction term such that afamin positively correlated with birth GA among babies with mothers without university education, and negatively correlated with birth GA among babies with mothers with university education (undergraduate or postgraduate) (standardised β = , 95% CI − 0.20 to − 0.04, p = 0.0041, Supplementary Fig. 10, Additional File 1).
In the planned analysis of preterm-born babies only, when controlling for inflammatory exposures (sepsis, HCA, NEC, and BPD), a greater proportion of R 2 was explained (adjusted R 2 = 0.031-0.262 in unadjusted models, adjusted R 2 = 0.047-0.392 in adjusted models), but the EpiScores no longer met our statistical threshold (p-values > 0.045 with adjusted p-value threshold < 8.3 × 10 −3 , see Supplementary Table 5

Sensitivity analyses
There were few changes to the significant associations between 43 EpiScores and GA, when models were adjusted for maternal smoking, diabetes, obesity, and mode of delivery (see Supplementary Table 6, Additional file 1).There were minor changes in proportion of R 2 explained (change in adjusted R 2 =|0.0001-0.051|)and change in standardised β was by 0-12%.All adjusted models retained the threshold of p < 0.05, although 4/43 (6.2%) EpiScores no longer met the adjusted p-value threshold < 8.3 × 10 −3 .
For associations between EpiScores and SES or an interaction between SES and birth GA, there was change in adjusted R 2 = |0.002-0.004|and in standardised β by 6-11% (see Supplementary Table 6, Additional file 1).All

Table 2 Protein EpiScores associated with birth gestational age
Forty-three EpiScores associated with birth gestational age in regression models adjusted for socioeconomic status.Roles adapted from the STRING database [62].See Supplementary Table

Discussion
In this study, we identified several associations between a set of EpiScores enriched for inflammatory proteins and low GA at birth.Few EpiScores associated with SES within the whole sample and these associations were partially attenuated in preterm infants who experienced inflammatory comorbidities.This is the first study to assess the impact of preterm birth and social status using epigenetic signatures designed to reflect the circulating proteome.

Associations between SES measures and EpiScores
SES appears to play a much smaller role in the patterning of EpiScores compared to birth GA.SES measures, or interactions between birth GA and SES measures, were associated with only three of the 104 EpiScores studied: afamin, ICAM5, and HGFI.Afamin and ICAM5 positively associated with maternal occupation and SIMD, respectively.Afamin and HGFI both associated with the interaction term between SES and birth GA.Afamin is involved in vitamin E transport [108], ICAM5 has a role in microglial regulation [109], and HGFI is a macrophage-stimulating protein [110].The relationships between these proteins and SES have not previously been investigated, although afamin is associated with the development of metabolic syndrome [111], which varies with SES [112].Among preterm infants, no SES-EpiScore associations survived adjustment for inflammatory exposures, which suggests that the weak effects of SES on the neonatal proteome that we observed in a small number of EpiScores are at least partially accounted for by inflammatory pathologies in early life.Taken together, the results suggest immune dysregulation, proxied by EpiScores, may not be the primary axis through which SES becomes embedded in the development of preterm infants during neonatal intensive care.
SES has been consistently associated with inflammation in adulthood, including in relation to childhood deprivation [15,113], but less is known about the relationship between SES and inflammation in the neonatal period.A longitudinal study by Leviton et al. [114], with five sampling time points during the first month of life after preterm birth, showed that maternal eligibility for Medicaid associated with levels of 14 inflammatory proteins (IL6R, TNFR1, TNFR2, IL8, ICAM1, VCAM1, TSH, EPO, bFGF, IGF1, VEGF, PIGF, Ang-1, Ang-2).However, only three were significant at more than one time point during the month after preterm birth (TSH, bFGF, Ang-1), and none was associated at all five measurement time points.These studies, taken together with our results, suggest that the impact of SES on immune regulation is relatively modest and inconsistent in the newborn period but accrues through to adulthood.Further research is required to understand how and when SES becomes embedded in child development and whether early life events such as preterm birth modify that process; EpiScores could be a powerful tool for investigating the temporal dynamics of social determinants of child health.

Sensitivity analyses
Post hoc analyses showed potential small effect of maternal variables on the neonatal methylome in this cohort enriched for preterm birth sampled at term equivalent age.This is consistent with previous findings of no association between smoking or preeclampsia and DNAm within this cohort [32].This may be due to the reduced in utero exposure to maternal factors for preterm infants, or that the samples were taken at term equivalent age, so neonatal unit exposures may outweigh maternal factors.

Strengths and limitations
Strengths of this study include the large sample of term and preterm neonates; to the best of our knowledge, this is the first examination of multiple DNA methylationbased estimators of circulating proteins in a neonatal sample.We derived EpiScores from minimally invasive sampling (buccal cells from saliva) which overcomes the ethical challenge of venepuncture for research in children.The EpiScores, serving as proxies of inflammatory proteins and sampled at term equivalent age in preterm infants, were selected for their potential to capture chronic, cumulative inflammation associated with preterm birth and neonatal intensive care exposures [23][24][25].We adjusted for variables associated with DNAm, and additionally for inflammatory exposures to increase the clinical validity of our results.
The study has some limitations.The EpiScores used were developed in adult cohorts [26][27][28]45] and have not been validated with neonatal protein levels.A validation study would be challenging because the phasic nature of circulating proteins and maturational variation in protein expression would require serial venepuncture, which presents ethical and practical barriers in preterm infants.Of note, we have previously established that neonatal DNAm CRP scores correlate with cumulative clinical inflammatory exposures, which is corroborative evidence that the score developed in adults is relevant in neonates [32].The 104 EpiScores we tested explain 1-58% of variance of protein levels [26][27][28]45].However, even those that capture a relatively low proportion of the variance associate with incident diseases such as cardiovascular disease, type 2 diabetes, cognitive function and brain health [26,[115][116][117].This magnitude of variance explained is also comparable to that achieved with polygenic risk scores, which have proved useful in risk stratification [118][119][120].The EpiScores were also trained using blood samples [26][27][28]45], whereas we have projected these scores into saliva samples.However, previous studies have successfully used similar cross-tissue techniques [32,121,122], and in neonates saliva provides a noninvasive and accessible sample method.Not all inflammatory-related proteins are represented, as we were limited by available EpiScores, so we may have underestimated the full complexity of the relationship between birth GA, SES, and inflammation.Longitudinal investigations are imperative for elucidating whether the DNA methylation signatures associated with gestational age identified in this study exert a causal influence on the inflammationassociated mechanisms in preterm birth.It remains crucial to discern whether these signatures represent a direct downstream consequence of GA itself or are induced by specific factors correlated with GA, yet not necessarily driven by chronic inflammation.Mendelian randomisation studies, integrating genomic and epigenomic determinants, are a promising methodological approach to disentangle the directionality of these intricate relationships.The sensitivity analyses suggested a potential weak effect of maternal exposures on associations between GA and EpiScores at term equivalent age.However, the prevalence of some of the maternal factors was low, for example 18/332 (5.4%) mothers had diabetes.Therefore, larger sample sizes enriched for the variable of interest, with longitudinal sampling from birth, would be required to investigate the relative contributions of antenatal/ intrapartum versus postnatal events on the associations we observed.
The study population is comparable to other neonatal populations in high-income, majority white settings, but these results may not generalise to settings with different socioeconomic or ethnicity profiles.We studied several measures of SES but were not able to include all that could be relevant, such as household income.

Conclusion
We identified 43 EpiScores enriched for inflammatory proteins that associated with low birth GA.These 43 proteins offer novel insights into the physiological response to preterm birth and warrant further study to explore their role in the relationship between preterm birth, inflammation, and longer-term outcomes.We found only three EpiScores associated with SES in the neonatal period, none of which survived adjustment for perinatal pro-inflammatory exposures, suggesting that inflammation is unlikely to be the primary axis through which SES becomes embedded in the development of preterm infants in the neonatal period.

Fig. 1
Fig. 1 Determining significance threshold.Principal component analysis was used to determine the adjusted statistical significance threshold, given multiple statistical comparisons.A A correlation matrix of 104 EpiScores, showing correlation coefficient as red for positive and blue negative associations when significant (p < 0.05).B A scree plot of principal components, with the eigenvalues for each component.Standardised component loadings for principal components one and two are provided in SupplementaryTable 3 (Additional File 1)

Fig. 3
Fig.3 EpiScores associated with socioeconomic status or an interaction between socioeconomic status and birth gestational age.EpiScores associated with socioeconomic status (Scottish Index of Multiple Deprivation, maternal education, or maternal occupation), or with an interaction between socioeconomic status and birth gestational age.Regression models with gestational age at birth, gestational age at sample, birthweight z-score, sex, and methylation processing batch.Sample sizes: for the Scottish Index of Multiple Deprivation n = 331, for maternal education n = 323, and for maternal occupation n = 328.3/104 EpiScores were significant (Bonferroni-adjusted p-value < 8.3 × 10 −3 ).Points and bars represent standardised beta and 95% confidence intervals, with red indicating positive and blue negative associations.CI Confidence interval, GA Gestational age, HGFI Hepatocyte growth factor-like protein alpha chain, ICAM5 Intercellular adhesion molecule 5, SIMD Scottish Index of Multiple Deprivation